[fix](uniform function) fix constant argument handling and use ColumnView #63076
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Reviewed PR #63076 against the Doris code-review checklist. I found one blocking correctness issue in the BE scalar function constant-argument path.
Critical checkpoints:
- Goal/test: The PR tries to fix
uniformconstant argument handling, but the all-constant execution path can still execute with inconsistent temporary column sizes and is not proven by a regression test. - Scope: The code change is small, but it changes framework metadata with non-local effects in
PreparedFunctionImpl::default_implementation_for_constant_arguments. - Concurrency/lifecycle/config/compatibility/persistence/data writes: Not applicable; this is a stateless BE scalar function change with no new config, storage format, transaction, or protocol changes.
- Parallel paths: Nereids rejects literal seed arguments, but BE/classic execution and BE constant-fold execution still need to be safe because this BE implementation is the final execution boundary.
- Conditional checks/error handling: Existing min/max validation is preserved, but the seed literal/non-constant invariant is still not enforced in BE.
- Tests: Existing regression tests cover Nereids analysis errors, but there is no added test covering the BE all-constant framework path affected by this override.
- Observability/performance: No new observability needed; performance impact is negligible relative to per-row RNG.
User focus: No additional user-provided review focus was specified.
| @@ -157,6 +159,8 @@ class FunctionUniform : public IFunction { | |||
| return Impl::get_variadic_argument_types(); | |||
There was a problem hiding this comment.
This override makes the default all-constant path keep min/max as ColumnConst but unwrap the seed argument to its nested data column. In PreparedFunctionImpl::default_implementation_for_constant_arguments, temporary_block.rows() is then taken from the first column, so for a row-producing query such as a classic-planner path that reaches BE with uniform(1, 100, 1), the temp block has min as ColumnConst(size = input_rows_count) and seed as ColumnInt64(size = 1). execute_impl loops input_rows_count rows and ColumnView treats the unwrapped seed as non-const, so value_at(1) and later read past the one-row seed column.
Nereids currently rejects a literal seed, but this BE function is also the execution boundary for non-Nereids/classic and BE constant-fold paths, and this PR specifically changes BE constant handling. Please either enforce the third argument must not be constant at BE execution/open time or disable/adjust the default constant-argument path so the seed remains a correctly sized ColumnConst rather than a one-row nested column.
There was a problem hiding this comment.
This override makes the default all-constant path keep
min/maxasColumnConstbut unwrap the seed argument to its nested data column. InPreparedFunctionImpl::default_implementation_for_constant_arguments,temporary_block.rows()is then taken from the first column, so for a row-producing query such as a classic-planner path that reaches BE withuniform(1, 100, 1), the temp block hasminasColumnConst(size = input_rows_count)and seed asColumnInt64(size = 1).execute_implloopsinput_rows_countrows andColumnViewtreats the unwrapped seed as non-const, sovalue_at(1)and later read past the one-row seed column.Nereids currently rejects a literal seed, but this BE function is also the execution boundary for non-Nereids/classic and BE constant-fold paths, and this PR specifically changes BE constant handling. Please either enforce the third argument must not be constant at BE execution/open time or disable/adjust the default constant-argument path so the seed remains a correctly sized
ColumnConstrather than a one-row nested column.
某个设计上的问题,目前的be的执行框架,在执行的时候,一个函数/表达式 返回会返回column const,或者非column const (有一些函数会根据runtime的value来直接返回一个column const 的column
|
run buildall |
|
run beut |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31460 ms |
TPC-DS: Total hot run time: 168848 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 30774 ms |
TPC-DS: Total hot run time: 169726 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud_p0 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
/review |
There was a problem hiding this comment.
I completed the review against the full PR diff and the existing inline review context. I am requesting changes because the already-open thread on be/src/exprs/function/uniform.cpp remains a blocking correctness issue: the new {0, 1} get_arguments_that_are_always_constant() override still lets the default all-constant path unwrap a constant seed into a one-row nested column while execute_impl() iterates input_rows_count rows. I did not add another inline comment for that because it has already been raised in the existing review thread.
Critical checkpoint conclusions:
- Goal/test: the goal is to make
uniformhandle constant-argument behavior correctly while treating only min/max as always constant. The current code does not fully accomplish this for all-constant BE execution paths; the added regression only checks the double non-constant-seed row count and does not prove the problematic all-constant BE path is safe. - Scope/clarity: the code change is small and focused, but it relies on default constant handling in a way that is still unsafe for the seed column.
- Concurrency/lifecycle/config/compatibility: no new concurrency, special lifecycle, config, storage format, or FE-BE protocol compatibility concern found.
- Parallel paths: Nereids rejects literal seed, but BE execution/constant-fold/classic paths are parallel entry points and still need safe behavior at the BE boundary.
- Conditions/error handling: min/max validation is preserved; seed literal handling is not enforced at BE execution/open time.
- Test coverage: coverage is insufficient for the BE constant-argument failure mode; the new test only validates successful double execution count with
random()*10000. - Observability/performance/transaction/persistence: no additional issues found.
User focus: no additional user-provided review focus was specified.
What problem does this PR solve?
Issue Number: N/A
Problem Summary:
The uniform function takes three arguments: min, max, and seed. Only the first two (min, max) are truly "always constant" — the seed column should be treated as a
regular column, not a constant. Without overriding get_arguments_that_are_always_constant(), when a user passes a constant value as the third argument (seed), the
default framework logic treats it as a constant column, leading to incorrect results.
Root cause: the base class default get_arguments_that_are_always_constant() does not distinguish between the seed argument and the min/max arguments, so a constant
seed would be folded by the constant-handling path rather than being treated as a per-row value.
Fix:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)